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FLUSI: A NOVEL PARALLEL SIMULATION TOOL FOR FLAPPING 
INSECT FLIGHT USING A FOURIER METHOD WITH VOLUME 

PENALIZATION^ 
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Abstract. FluSI, a fully parallel open source software for pseudo-spectral simulations of three- 
dimensional flapping flight in viscous flows, is presented. It is freely available for non-commercial 
use under https://github.com/pseudospectators/FLUSI The computational framework runs on high 
performance computers with distributed memory architectures. The discretization of the three- 
dimensional incompressible Navier-Stokes equations is based on a Fourier pseudo-spectral method 
with adaptive time stepping. The complex time varying geometry of insects with rigid flapping wings 
is handled with the volume penalization method. The modules characterizing the insect geometry, 
the flight mechanics and the wing kinematics are described. Validation tests for different benchmarks 
illustrate the efficiency and precision of the approach. Finally, computations of a model insect in the 
turbulent regime demonstrate the versatility of the software. 
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1. Introduction. Flapping flight is an active interdisciplinary research held with 
many open questions, and numerical simulations have become an important instru¬ 
ment for tackling them. Here, we present the computational solution environment 
FluSI which is, to the best of our knowledge, the first open source code in this held. 

In the literature different numerical strategies have been proposed simulating 
flapping flight, e.g., the use of several, overlapping grids [19], that are adapted to 
the geometry, or the use of moving grids with the Arbitrary Lagrangian-Eulerian 
method |29| . Those methods involve significant computational and implementation 
overhead, whose reduction has motivated the development of methods that allow for 
a geometry-independent discretization, such as immersed boundary methods. 

In this work, we employ the volume penalization method to take the no-slip 
boundary condition into account. The idea of modeling solid obstacles as porous 
media with a small permeability has been proposed in |2]. The forcing term acts on 
the entire volume of the solid and not only on its surface, as it is the case in the 
immersed boundary methods, and corresponds to the Darcy drag. The distinctive 
feature of the method is the existence of rigorous convergence proofs mi showing 
that the solution of the penalized equations does indeed converge to the solution of the 
Navier-Stokes equations with no-slip boundary conditions. This approach has been 
extended to model not only Dirichlet conditions applied at the surface of moving, rigid 
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and flexible obstacles caE], but also to homogeneous Neumann conditions, which is 
relevant for studying, e.g., turbulent mixing of a passive scalar m- As animals forage 
following odor traces, this technique can potentially be attractive for insect flight 
simulations as well. An interesting recent development has been proposed in m in 
the context of finite-difference discretizations. Their idea is to modify the fractional 
step projection scheme such that the Neumann boundary condition, as introduced in 
m, appears in the pressure Poisson equation. Another variant, specifically adopted 
to impulsively started flow, is the iterative penalization method proposed in |12| . For 
reviews on immersed boundary and penalization techniques we refer to 

The remainder of this article is organized as follows. First we discuss in sec¬ 
tion the numerical method of FluSI’s fluid module, which is based on a spectral 
discretization of the three-dimensional penalized Navier-Stokes equations. Section 
describes the module that creates the insect geometry, as well as the governing 
equations for free flight simulations, and we focus on the parallel implementation in 
section Thereafter we present a validation of our software solution environment in 
section]^ for different test cases established in the literature, and give an outlook for 
applications to flapping flight in the turbulent regime in section Conclusions are 
drawn in section |3 

2. Numerical method. 

2.1. Model equations. For the numerical simulation of many flapping flyers, 
like insects, the fluid can typically be approximated as incompressible. It is thus 
governed by the incompressible Navier-Stokes equations with the no-slip boundary 
condition on the fluid-solid interface. The numerical solution of this problem poses 
two major challenges. First, the pressure is a Lagrangian multiplier that ensures the 
divergence-free condition. Traditional numerical schemes employ a fractional step or 
projection method US], requiring to solve a Poisson problem for the pressure. Sec¬ 
ond, the no-slip boundary condition must be satisfied on a possibly complicated and 
moving fluid-solid interface, for which boundary-fitted grids are classical approaches 
[35]. Since the generation of these grids may be challenging, alternatives have been 
developed. The principal idea is to extend the computational domain to the interior 
of obstacles. The boundary is then taken into account by adding supplementary terms 
to the Navier-Stokes equation. The technique chosen here is the volume penalization 
method, which is physically motivated by replacing the solid obstacle by a porous 
medium with small permeability Crj. The penalized version of the Navier-Stokes 
equations then reads 


( 2 . 1 ) 

( 2 . 2 ) 

(2.3) 


O I V7TT I V72 ^ ( \ ^ V7 (Xspl^) 

OtU + U= —vn -I- vV u— {u — Us) ~ 7^^ ^ 2 — 

Cyjj Ggp V 

'-V-^ '-V-^ 

penalization sponge 

V • U = 0 

u{x,t = 0) = Mg (x) XGil,t> 0, 


where u is the fluid velocity, w = V x u is the vorticity and v is the kinematic 
viscosity. The sponge term is explained in section 2.3 Eqns. ( 2. Ip. 3 1 are written in 
dimensionless form and the fluid density £»/ is normalized to unity. The nonlinear term 
in eqn. (2.11 is written in the rotational form, hence one is left with the gradient of 
the total pressure 11 = p + ^u-u instead of the static pressure p j^Sj. This formulation 
is chosen because of its favorable properties when discretized with spectral methods. 
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namely conservation of momentum and energy |28l pp. 210]. At the exterior of the 
computational domain, which is supposed to be sufficiently large, periodic boundary 
conditions can be assumed. The mask function y is defined as 


(2.4) 


X (x, t) 


0 ii x&Vlf 
1 if X S Sis ’ 


where Sly is the fluid and Sl^ the solid domain. In anticipation of the application to 
moving boundaries, we note that we will replace the discontinuous y-function by a 
smoothed one, with a thin smoothing layer centered around the interface. The mask 
function encodes all geometric information of the problem, and eqns. ( 2. Ip. 3 1 do not 
include no-slip boundary conditions. 

Note that in the fluid domain Sly one recovers the original equation as the pe¬ 
nalization term ^ (u — u^) vanishes. The convergence proof in II ID shows that the 
solution of the penalized Navier-Stokes equations ( 2. Ip. 3 1 tends for > 0 indeed 
towards the exact solution of Navier-Stokes imposing no-slip boundary conditions, 
with a convergence rate of O in the L^-norm. The parameter should thus 

be chosen to a small enough value, which is also intuitively clear by the physical inter¬ 
pretation of Cjj as permeability. However, the choice of C,, is subject to constraints, 
as the penalized equations are discretized and solved numerically. The modeling error 
of order O should be of the same order as the discretization error jMj. It is 

first noted that in eqn. (2.11, C,, has the dimension of a time. It is instructive to 
put the nonlinear, viscous and pressure terms aside for a moment. One is then left 
with dfU = —ujC-i-i inside the solid, with the obvious solution u = {—t/Cn)- 

Thus, can be directly identified as the relaxation time. Interfering with the time 
step At, usually implying At = O (Cn), this simple fact has important consequences 
for the numerical solution. It indicates that a good choice for Cn is not only “small 
enough”, but also “as large as possible”. Further insight in the properties of the penal¬ 
ization method can be obtained considering the penalization boundary layer of width 
Sn = sJvUn, which forms inside the obstacle Og, as shown in [B]. When increasing the 
resolution, the number of points per boundary layer thickness, K = 5nl should be 
kept constant, which implies the relation Cn <x. (Ax)^, consistent with [24], where the 
penalized Laplace and Stokes operators were analyzed analytically and numerically. 
With the scaling for Cn, one still has to choose the constant K. In fact, for any 
value of K, the method will converge with the same convergence rate, but the error 
offset can be tuned. For the two-dimensional Couette flow, in we reported the 
optimal value of AT = 0.128. In a range of numerical validation tests, including the 
ones presented here, we find K = 0.1 — 0.4 as a good choice which we recommend as 
a guideline for practical applications. 


To satisfy the incompressibility constraint (2.21, a Poisson equation for the pres¬ 


sure is derived by taking the divergence of eqn. (2.11, yielding 


(2.5) 


V^n = V • ( —w XU— {u — u 


By construction of the method, eqn. (2.5) can be solved with periodic boundary 
conditions. 

The aerodynamic force F and the torque moment to, both important quantities 
in the present applications, can be computed from 


(2.6) F=(^ cr-nd'y=-^[ x{u-u,)dV+^[ u^dF 
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(2.7) rn= (j) r x {cr ■ n) dj = f r x x (m — dF + ^ f r x u^dF, 
laQs Jn ™ 

where the additional terms are denoted ‘unsteady corrections’ |36| . 

2.2. Penalization method for moving boundary problems. The volume 
penalization method as discussed so far assumes a discontinuous mask function in 
the form of eqn. (2.4|. When applying the method to a non-grid aligned body, 


for example a circular cylinder, the mask function geometrically approximates the 
boundary to first order in Ax. Results for stationary cylinder using the discontinuous 
mask function are acceptable |31l [33] , but spurious oscillations in the case of moving 
ones are reported HZ]. The reason is that the discontinuous mask can be translated 
only by integer multiples of the grid spacing, and this jerky motion causes large 
oscillations in the aerodynamic forces. Kolomenskiy and Schneider proposed an 
algorithm to shift the mask function in Fourier space instead of physical space. In the 
present work, we employ a different approach, because displacing the mask in Fourier 
space involves (additional) Fourier transforms, which are computationally expensive, 
especially if more than one rigid body is considered, as it is the case here. The idea on 
hand is to directly assume the mask function to be smoothed over a thin smoothing 
layer Eiin]. To this end, we introduce the signed distance function S{x,t) [2Sj, and 
the mask function can then be computed from the signed distance, using 


( 2 . 8 ) 


1 S < —h 

X(d) = '| + -h < S <+h 

0 6 > +h 


where the semi-thickness of the smoothing layer, h, is used. It is typically defined 


relative to the grid size, h = CsmthAa;, thus eqn. (2.81 converges to a Heaviside step 
function as Aa; —)■ 0. Nonetheless, it can be translated by less than one grid point, 
and then be resampled on the Eulerian fluid grid. 


2.3. Wake removal techniques. The penalized Navier-Stokes equations (2.1 


2.3) do, by principle, not include no-slip boundary conditions and furthermore we 
discretize them in a periodic domain H. In such a periodic setting, the wake re¬ 
enters the domain, which is an undesired artifact. To overcome it, a supplementary 
“sponge” penalization term can be added to the vorticity-velocity formulation of the 
Navier-Stokes equations, in order to gradually damp the vorticity The sponge 

penalization parameter Cgp is usually set to a larger value than Cn, typically Cgp = 
10“^. The larger value and its longer relaxation time ensure that if a traveling vortex 
pair enters the sponge region, the leading one is not dissipated too fast, because it 
otherwise could leave the partner orphaned in the domain. Applying the Biot-Savart 
operator to the vorticity-velocity formulation we formally And — X %#. By 
construction the sponge term is divergence-free, which is important since otherwise 
it would contribute to the pressure, which in turn would be modified even in regions 
far away from the sponge due to its nonlocality. Moreover, it leaves the mean flow, 
i.e., the zeroth Fourier mode, unchanged. This technique is well adapted to spectral 
discretizations, since computing the sponge term requires the solution of three Poisson 
problems, which becomes a simple division in Fourier space. A discussion on the 
influence of the vorticity sponge can be found in section and Fig. |6.3| 

Dirichlet conditions on the velocity can be imposed directly with the volume pe¬ 
nalization, which applies for example to channel walls. For simulations in a uniform. 
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unbounded free-stream, we use both techniques; in a small layer at the domain bor¬ 
ders, the Dirichlet condition u = is imposed with the same precision as the actual 
obstacle, and a preceding, thicker sponge layer ensures that the upstream influence 
is minimized jHl- The sponge technique is similar to the “fringe regions” proposed in 
m- However, in m only a velocity sponge has been used, which corresponds to the 
penalization term, i.e., —Xsp {%L ~ Msp) /C'sp- The vorticity sponge idea has the advan¬ 
tage of not requiring a “desired” velocity field u^p, which has to be set a priori. It also 
does not contribute to the pressure, which helps reducing the region of influence. 


2.4. Discretization. The model equations (2.1 1 and (2.5 1 can be discretized 
with any numerical scheme, in particular a Fourier pseudo-spectral discretization can 
be used ED na The general idea is to represent field variables as truncated 
Fourier series, thus in three dimensions we have for any quantity cp (velocity, pressure, 
vorticity) the discrete complex Fourier coefficients ip. They can be computed efficiently 
with the fast Fourier transform (FFT) |7]. The gradient of a scalar q can be obtained 
by multiplying the Fourier coefficients q with the wavevector k = {kx, ky, kz)"^ and 
the complex unit, Vg = ikq. The Laplace operator becomes a simple multiplication 
by — \k\^■ When using, e.g., finite differences, the dominant part of computational 
efforts is spent on solving the Poisson equation in every time step [H]. This is a 
strong motivation to employ a Fourier discretization, as inverting a diagonal operator 
becomes a simple division. Inserting the truncated Fourier series into the model 
equations and requiring that the residual vanishes with respect to all test functions 
(which are identical with the trial functions exp {tk ■ x) ) yields a Galerkin projection 
and results in an evolution equation for the Fourier coefficients of the velocity. The 
nonlinear and penalization terms contain products, which become convolutions in 
Fourier space. To speed-up computation, the products are calculated in physical 
space. This last introduces aliasing errors which are virtually eliminated by the 2/3 
rule [S], meaning that only 2/3 of the Fourier coefficients are retained. Such a mixture 
of spectral and physical computations is generally labeled "pseudo-spectral" and is, 
when de-aliased, equivalent to a Fourier-Galerkin scheme. The code can be run with 
and without dealiasing, but our choice is to conservatively always turn dealiasing on. 
More details on the effect of dealiasing are discussed in |241 p. 318]. 

The spatially discretized equations can then be advanced in time in Fourier space. 
The code provides a classical fourth order Runge-Kutta with explicit treatment of the 
diffusion term, and a second order Adams-Bashforth scheme with integrating factor 

m- 


Besides the fast solution of Poisson problems, the spectral method has the ad¬ 
vantage of not adding numerical diffusion or dispersion to the penalized equation, 
unlike it is the case when discretized with finite differences. Furthermore, most of the 
computational effort is concentrated in the Fourier transforms, which is advantageous 
from a computational point of view. However, the discretization requires the use of 
an equidistant grid, which implies a large number of grid points to resolve the thin 
boundary layers. 

Note that we do not explicitly apply a Neumann-type boundary condition for the 
pressure in our method. We only use periodic boundary conditions in the computa¬ 
tional domain, which are natural for Fourier methods. It was shown in [T] that the 
pressure held of the penalized problem converges to the pressure in the original bound¬ 
ary value problem. The discretization that we use is consistent and stable, therefore 
the numerical solution converges to the exact solution of the continuous penalized 
problem. 


5 




3. Virtual insect model. We previously described the fluid module where the 
geometry is taken into account by the penalization method, which is the interface 
with the insect module described next. 

Insects fly by flapping their wings, which are basically flat with sharp edges and 
operate typically at high angle of attack. In the following, we describe the insect 
framework used in this work in detail, the essential task being to construct x 


which enter eqn. (2.1 1 . The virtual insect consists of a body and two wings, all of 
which are performing solid body rotations around three axes and assumed to be rigid. 
Therefore, we will make use of the rotation matrices 


R. iO 

Rz (?) 


1 0 0 \ / cos? 0 — sin^ \ 

0 cos? sin^ j (^) = j 0 1 0 j 

0 — sin f cos ? y y sin ^ 0 cos ? J 

cos? sin^ 0 \ 

— sin ^ cos ? 0 

0 0 1 / 


and define the different reference frames, namely the global body stroke 
plane and wing in which the geometry is defined. As described above, the 
mask function is constructed in each evaluation of the right hand side as a function 


of the signed distance function, x (a;) = x (<5 (x)), according to eqn. ( 2 . 81 . 


3.1. Body system. The insect’s body is responsible for a major part of the total 
drag force. It is described by its logical center ^^ntn ^^e translational velocity 


and the body angles /? (pitch), 7 (yaw) and ip (roll), see figure 3.1 The center point 
does not necessarily coincide with the center of gravity, it is rather an arbitrary 
point of reference. A point a;^®^ in the global coordinate system can be transformed 
to the body system using the following linear transformation 


= Mbody iP, P, 7 ) - ®cnl) 

(3.1) Mbody =-R.(/')i?y(/3)i?z(7)- 


Since rotation matrices do not commute, it is important to note that the body is first 
yawed, then pitched and Anally rolled, which is conventional in flight mechanics. The 
geometry of the body is defined in the body reference frame. The angular velocity of 
the body in the global system is 


= Mbody 


0 

0 

7 


+ Rp^ m 


0) (^ ) 


which defines the velocity held inside the insect resulting from the body motion, 
(3.2) + M^-^ (ni') X 


Equation (3.2 1 is valid also in the wings, since the flapping motion is prescribed 
relative to the body. 
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Figure 3.1. Model insect with definitions of the body angles 7 (yaw), /3 (pitch), ip (roll), the 
anatomical stroke plane angle p, the wing coordinate systems and the wing angles 9 (deviation), (p 
(position) and a (feathering). All angles are shown with positive sign. 


3.2. Body shape. The body shape is described in the body reference frame 
described previously. For instance, for the body depicted in figure |3.1| which is 
composed of an ellipsoidal shaped thorax and spheres for the head and eyes, the 
signed distance function is the intersection of the distance functions for the thorax, 
head and eyes. 


(3.3) 


^body — max (<5thorax5 ^headj ^ eyes) ■ 


The max operator of the signed distances in eqn. (3.3) represents the intersection 
operator |25]. The signed distances for the components read 


<5thorax 


4ead 

6eyes 



„(&) 



(b) 

— 0 ,eyes 




where a, b define the axes of the thorax ellipsoid and aig^^ead eyes centers of the 

spheres. 
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3.3. Wing system. We consider only insects with two wings, one on each side, 
that are rotating about the pivot points j. and [■ These pivots do not 

necessarily lie on the body surface; we rather allow a gap between wings and body. 
This gap avoids problems with non-solenoidal velocity fields at the wing base. It is 
conventional to introduce a stroke plane, which is a plane tilted with respect to the 
body by an angle ry. The coordinate in the stroke plane reads 

= Mstroke - 4Lt) • 

Here, we use an anatomical stroke angle, that is, the angle rj is defined relative to 
the body. Within the stroke plane, the wing motion is described by the angles a 
(feathering angle or angle of attack), (p (positional or flapping angle), 9 (deviation or 
out-of-stroke angle). Applying two rotation matrices yields the transformation from 
the body to the wing coordinate system: 

= Mwing^^®^ = AfwingMstroke ' 

When flapping symmetrically, i.e., both wings following the same motion protocol, 
the stroke and wing rotation matrices for the left and right wing are given by 


Afstroke.l — Ry (^) Afgtroke.r — Rx ('^) Ry (^) 


Mwing.i = Ry (a) Rz (-9) Rx {4>) 




= Ry (-a) R, (-9) R, i-p) 


due to the rotation R^ (tt) the sign of 9 for the right wing does not have to be inverted. 
The angular velocities of the wings are given by 


q(*') — /Vf-i 


Rx^ W 


+ R-^{-9) 



q (“) — M ■ 


which is used to compute the velocity field resulting from the wing motion, 

«(,-) = Hi™) X 


(3.4) 


ui?) = M-i. M,-} M,-^ 


^body 


stroke — 


(w) 


The total velocity field inside the wings is given as the superposition of the body and 
wing rotation, 

yff'> (x e {^„}) = 

The actual kinematics, i.e., the angles a (t), p (t) and 9 (t) are parametrized by either 
Fourier or Hermite interpolation coefficients and read from a *. ini file. 

3.4. Wing shape. In the previous section we defined the wing reference frame 
in which we now describe the wing’s signed distance function S. In general, we 
define a set of several signed distance functions, each of which describes one surface 
of the wing. The signed distance function of the entire wing is then given by their 
intersection. For some model insects, we consider simple wings for which straightfor¬ 
ward analytical expressions are available. For a rectangular wing, for instance the one 
illustrated in figure 5.2 (c), we find for the signed distance function 

(3.5) < 5 ( 21 ^’")) = max - 6; - (H-6);yW - 1; a|z| - 










Figure 3.2. Realistic wing shapes are described in polar coordinates R{'d). 


For realistic insect wings however, we parametrize the wing shape in polar coordinates. 
As illustrated in figure the shape in the wing plane is described by the center point 
which is arbitrary as long as the function R{'d) is unique for all -d. To sample 
the wing on a computational grid, we need a function R (-d) that can be evaluated for 
all d. As R{'d) is naturally 27r-periodic, a truncated Fourier series can be used: 


(3.6) 


Qi COS (27rii?) + bi sin (27rzi?) 


N 


N 


i=l 


In practice, eqn (3.6 1 has to be evaluated for all grid points in the vicinity of the wing. 


which requires O evaluations with a small constant. The computational 

cost can however be significant, which is why eqn. (3.6 1 is evaluated for 25 000 values 


of once during initialization. Afterwards, linear interpolation is used for its lower 
computational cost. The signed distance for such a wing then reads 



= max ( 

zM 


V 



-h/2-,r{'0)- R{-d) 


If a wing cannot be described by one radius, because R (i?) is not unique, several radii 
and center points can be used |4]. 


3.5. Power requirement. Actuating the wings requires power expenditures 
that are very difficult to measure directly. In numerical simulations, the power can be 
obtained directly, since the aerodynamic torque moment m with respect to the wing 
pivot point is available from eqn. (2.7 The power Paero required to move one wing 
is found to be 


(3.7) Paero = “W ’ {^Iw “ ilfc) 

which is equivalent to the definition Paero = / udP given in m- In addition to the 
aerodynamic power, the inertial power has to be expended, i.e., the power required 
to move the wing in vacuum. As the flapping motion is periodic, its stroke averaged 
value is zero. The inertial power Pinert is positive if the wing is accelerated (power 
consumed) and negative if it is decelerated. The definition is 

Pinert = X 

with the wing tensor of inertia [3]. The sum of inertial and aerodynamic power 
can be negative during deceleration phases, which would mean that the insect can 
store energy in its muscles. It is unknown to what extent this can be realized, and 
it is thus often conservatively assumed that energy storage is not possible, in which 
case the total power Ptotai is given by Ptotai = max (Pinert + Paero, 0 ). 
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3.6. Governing equations in free flight. Until now we have considered the 
insect to be fixed, i.e., tethered in the computational domain. In free flight, additional 
equations have to be solved together with the fluid, namely Newton’s law. The body 
translation is then governed by 

where contains the aerodynamic and gravitational forces and M is the mass of 
the insect. For simplicity, Xcontr Mcentr correspond to the center of gravity in 
the case of free flight. To handle the rotational degrees of freedom, we employ a 
quaternion based formulation, similar to the one proposed in |20| . which avoids the 
‘Gimbal lock’ problem. The governing equation for the angular velocity £2^^^ in the 
body reference frame reads 

/ 0 \ 

V - 4 ^) dd 0 J 


where is the moment of inertia around the body axes and m is 

the vector of torque moments as defined in eqn. (2.7). The skew-symmetric term 


stems from the change into a moving reference frame. We introduce the normalized 
quaternion e with componen 
for the quaternion state are 


with the matrix 


'z, i = 

0,... 

,3, E£f = 

e = 

1 T 



2 = 


-£l 

£0 

£3 -£2 

-£2 

-£3 

£0 £1 

-£3 

£2 

— £1 Eq 


Assuming to be constant the body axes to be the prinicipal axes of inertia (i.e., 
is diagonal), the following first order system set of 13 equations is obtained 


(3.8) 


_d 

dt 


/ 


•^cntr 

J/cnl 

Ja) 

^cntr 

“'cntr.x 

^cntr.y 

y(9) 

^cntr,z 

£0 

£1 

£2 

£3 
(b) 

X 

ib) 
y 

(b) 


„(9) 

^cntr,x 

cntr.v 

(s) 

'cntr,z 


u, 




Fy 

?{ 9 ) , 


( 9 ), 


>/M 

dM -g^ 

(-e^dd - e^dd - e^dd) /2 


; e^d - e:idd 

i(t>) 


£2^z 


+ eo^d - £id 


(b) 


(-£ 2 ^ 

' jib) _ jib)^ 

dy Uz 

^ jib) jib)^ 
d Z dr 


- ejdd + e^dd 

-I- md 

dddd + md 


V [[dd - Fm, 
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Scaling test: isotropic turbulence 


Scaling test: bumblebee 




Figure 4.1. Parallel scaling tests on a large-scale computing cluster of type IBM BlueGene/Q. 
Left: isotropic turbulence simulation without insect, for two different resolutions. Fifty time steps 
have been performed. The reference simulation is the 1024 Tun. Right: scaling test for a bumblebee 
simulation (see section lT^ . Two complete strokes have been computed, and the timing is averaged 
over the second one. The reference simulation is the 2048 cores run. 


which is solved with the same time discretization as the fluid. The rotation matrix 
Mbody is then computed from the quaternion Si 


( 3 . 9 ) Mbody = 


el + el- el- el 2 {eie 2 - £ 3 ^ 0 ) 
2 (£ i £2 + £ 3 £ o ) el-ef+el-e 
2(£i£3~£2£o) 2(£2£3+£i£o) 


2 (£i£3 + £2£o) 

3 2 (£263 — £i£o) 

_ p2 _ 2 I 2 

tn tn to W to 


which replaces the definition in eqn. ( 3 . 1 ) in the free-flight case. The initial values 


at time t = 0 for ei can conveniently be computed from a set of yaw, pitch and roll 
angles, 


( So \ 


f cos (i/>/2) cos (/1/2) cos (7/2)-F sin (?/;/2) sin (/?/2) sin (7/2) \ 

£1 


sin (V'/ 2 ) cos (/ 3 / 2 ) cos (7/2) — cos (i/’/2) sin (/?/2) sin (7/2) 

£2 


cos {'0/2) sin (/ 3 / 2 ) cos (7/2) -|- sin (0/2) cos {/ 3 / 2 ) sin (7/2) 

V £3 y 


y cos ( 0 / 2 ) cos (/ 3 / 2 ) sin (7/2) — sin ( 0 / 2 ) sin (/ 3 / 2 ) cos (7/2) j 


In the actual implementation, we multiply the right hand side of equation ( | 3 . 8 [ ) with 
a six component vector with zeros or ones, to deactivate some degrees of freedom. 

4. Parallel implementation. Until now we described the numerical method 
employed by the FluSI code, which is based on the volume penalization method com¬ 
bined with a pseudo-spectral discretization. This framework allows for a high degree of 
modularization, since all geometry is encoded in the mask function and the solid veloc¬ 
ity field. The framework is intended to be applicable also for higher Reynolds number 
flow, for which small spatial and temporal vortical structures appear. To resolve these 
scales, high resolution and therefore the usage of high-performance computers is re¬ 
quired. To this end, our code is designed to compute on massively parallel machines 
with distributed memory architectures. The parallel implementation is based on the 
MPI protocol and written in F0RTRAN95. 

For the fluid part the P3DFFT library is used to compute the 3 D Fourier transforms 
| 26 | . This library provides a parallel data decomposition framework, and employs the 
FFTW library m for the one-dimensional Fourier transforms. The flow variables are 
stored on the three-dimensional Cartesian computational grid. Each MPI process 
only holds a portion of the total data, and the parallel decomposition is performed 
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on at most two indices, i.e., a pencil decomposition is used. The x-direction is not 
split among processes, the code can thus run on NyN^ processes at most. This 
limitation is however not of practical importance, since NyN^ usually exceeds the 
number of available CPUs by orders of magnitude. Besides the Fourier transforms, 
the operations are local, i.e., they do not require inter-processor communication. The 
parallel scaling test for the fluid module is shown in Fig. 4.1 (left), for the case of an 
isotropic turbulence simulation and two different problem sizes, 256^ and 768^. The 
latter shows good scaling up to 8192 CPU on the IBM BlueGene/Q machin^used 
for the tests. 

The insect module is used to generate the y-function and the solid body velocity 
field Ug. It is crucial that the implementation does not spoil the favorable scaling 
properties. The module is therefore designed around an derived insect datatype called 
diptera, which holds all properties of the insect, such as the wing shape, body position 
and Fourier coefficients of the wing kinematics. This object is of negligible size, and 
therefore each MPI process holds a copy of it. The task to construct the penalization 
term is then again completely local. Forces and torques (eqn. ( 2.6p.7 1) are computed 
efficiently with the MPI_allreduce command. It is further necessary to distinguish 
between distinct parts of the mask function, e.g., body and wings and possibly outer 
boundary conditions, like channel walls. To accomplish this, we introduce the concept 
of mask coloring, i.e., we allocate a second integer*2 array for the mask function that 
holds the value of the mask color. This allows to easily exclude for example channel 
walls from the force computation in eqn. (2.6), and limits the memory footprint to 
two arrays. We further use it to compute the forces acting on the body and wings 
individually. If the insect’s body is tethered, there is no need to reconstruct it every 
time the mask is updated. The mask coloring can in this case be used easily to delete 
only the wings from the mask function while keeping the body in memory. This 
reduces the CPU time requirement of the mask function, since the body occupies a 
relatively large volume (compared to the wings) and consequently a larger number of 
grid points are affected. 

Fig. 4.1 (right) shows the scaling test of an insect simulation. The insect model 
is the bumblebee presented in section but the scaling behavior is the same for any 
other model. We computed two complete strokes at a resolution of 1152 x 768 x 768, 
and the elapsed walltime was measured for the second one. The parallel scaling is 
virtually the same as in the pure fluid case, indicating that the insect module does 
indeed not spoil parallel scaling efficiency. 

The general process for a simulation is summarized in algorithm[2 All parameters, 
like the resolution etc., are read from a *.ini file, which avoids recompiling the 
code every time a parameter is changed. The code writes time series of quantities 
like the fluid forces, enstrophy or aerodynamic power to ascii *. t files. Flow field 
data, such as the velocity and pressure fields, are stored using the hierarchical data 
format (HDF5) library, in order to guarantee maximum compatibility with third-party 
applications, such as the open-source visualization software Paraview, which is used 
for the visualizations shown in the results section. Figure [T^ shows how the computing 
time spent on various tasks is distributed. The compuation is dominated by the 
fluid time stepping (87.8%); the remaing parts are used in the computation of flow 
statistics and field output. Computing the source terms is by far the most significant 
contribution, and the generation of the geometry only consumes about 8% of the total 
time. 


^ http: / / www.idris.fr / eng/turing/ 
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Algorithm 1 General process for a simulation. 

1. Initialization. Read parameters from *. ini file, allocate memory, set up fluid 
initial condition, initialize P3DFFT 

2. Begin loop over time step 

(a) Generate mask function x(x, t") and solid velocity field 

(b) Determine time step At 

(c) Gompute sources F (w”) = — w" x m" — ^ (m” — m") — ^ V x ^ 

(d) Add pressure gradient F" ^ F” - V [(V • F”) /V^] 

(e) Advance fluid to new time level = AB2 («"■,£”) 

(f) Optional: Advance free-flight equations to new time level, using the AB2 
scheme 

(g) Output: write flow statistics and flow field data to disk 

3. When terminal time is reached, free memory and quit. 



Figure 4.2. Partition of computational efforts spent on a bumblebee simulation (see section 
I jg[ ). The resolution is 1152 X 768 X 768. Most efforts are spent on advancing the fluid in time, which 
in turn is largely dominated by the computation of fluid source terms. All per-cent is with respect 
to total execution time. 


5. Validation Tests. For code validation we consider four test cases with in¬ 
creasing complexity, a falling sphere, a rectangular flapping wing, a hovering flight of 
a fruit fly and the free flight of a butterfly model. 

5.1. Falling sphere. The first test case to validate the flow solver is the sedi¬ 
mentation of a sphere, which in our terminology is an insect without wings and with a 
spherical body. We consider case 1 proposed by Mordant and Pinton [^, who studied 
the sedimentation problem experimentally in a water tank. The sphere of unit diam¬ 
eter and mass M = 1.3404 is falling in fluid of viscosity v = 0.0228 and unit density. 
The dimensionless gravity is g = 0.8036 and the terminal settling velocity obtained 
from the experiments is U = 0.9488. We perform a grid convergence test using the do¬ 
main size 8 x 8 x 16, and parameters N^'^Ny x W x C,, of a coarse (96x96x 192 x 10“^), 
medium (192 x 192 x 384 x 2.5 • 10-^) and fine (384 x 384 x 768 x 6.25 • 10"^) grid. 
The number of points per penalization boundary layer is A = 0.0573. The results 
of the convergence study are illustrated in figure |5.1| and the settling velocity for the 
finest resolution differs from the experimental findings by less than 1%. The finest 
resolution required 30 GB of memory and 34 400 GPU hours on 1024 cores to perform 
266 667 time steps. The computational cost is relatively high, since small values of 
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time 


Figure 5.1. Settling velocity of a falling sphere, experimental results by Mordant & Pinton 
and present results for coarse, medium and fine grids. 


Cr, are required as the Reynolds number is small. 


5.2. Validation case of a rectangular flapping wing. We consider the setup 
proposed by Suzuki and co-workers in |84| Appendix B2. It considers only one rect¬ 
angular wing with a finite thickness, neglecting thus the body and the second wing. 
The fact that the thickness is finite and the geometry rather simple, compared to 
actual insects, motivates the choice of this setup. The wing kinematics are given 
by ^ = (j)m cos {2TTt), a = tani^c tanh (ca sin (27rf)) and 9 — Q, where (j)m = 80°, 
am = 45°, Ca = 3.3; the motion is symmetric for the up- and downstroke and de- 
(a-b). The rectangular wing and the wing coordinate system are 
5.2 (c). We normalize the distance from pivot to tip to unity, which 


picted in figure 5.2 
illustrated in figure 


yields a = 1.6667, b = 0.0667, B = 0.4167 and a wing thickness of h = 0.04171. The 
Reynolds number is set to Re = UtipB/v = 100 with Utip = 2Tr(f>m, yielding the kine¬ 
matic viscosity v = 0.0366. In the present simulations, we discretize the domain of size 
3x3x3 using 512 x 512 x 512 points and a penalization parameter of Cr^ = 1.25 • 10“"^ 
(K = 0.365). The reference computation in |34| is performed in a domain of size 
4.16 X 4.16 X 4.16, using a fine grid near the wing and a coarse one in the far-field. 
Based on the resolution of the fine grid. Ax = B/50, the corresponding equidistant 
resolution would be 500^. In our simulations, the body reference point is located at 
•licntr = (l-5i 1-5, 1-7) and coincides with the wing pivot point, thus = 0. The 

orientation of the body coordinate system is given hy ijj = 0°, P = —45°, 7 = 45° 
and rj = —45°, where the 45° yaw angle was added to keep the wing as far away from 
the vorticity sponges as possible. A total of four strokes has been computed, starting 
with a quiescent initial condition, u{x,t = 0) = 0. The outer boundary conditions 
on the domain are homogeneous Dirichlet conditions in ^-direction and a vorticity 
sponge, extending over 32 grid points with Cgp = 10“^, in the remaining ones. The 
simulation required 35 GB of memory and 5785 CPU hours on 1024 cores. A total of 
27 701 time steps was performed. 

The resulting time series of the vertical force is illustrated in figure 5.2 (d). It 


takes the first two wingbeats to develop a periodic state, since the motion is im¬ 
pulsively started, and then the following strokes are almost identical. The present 
solution agrees well with the reference solution, and the relative R.M.S difference is 
IjF — Aref II 2 / ||Xef|l 2 ~ 4% Over the last two periods. 
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Figure 5.2. Flapping rectangular wing, (a) kinematics used in the test case, as given by 
Suzuki et al. \S4l - (b) visualization of the wing kinematics by a chord section (without body, color 
represents time) (c) Geometry of the flapping rectangular wing. Contrary to 1341 , we normalize the 
distance pivot-wing tip to unity, (d) time evolution of the vertical force acting on the wing for the 
last two strokes, with the reference solution presented in \34l- 


5.3. Hovering flight of a fruit fly model. The third validation test compares 
with a hovering fruit fly model, of which numerical simulations have been presented 
by Maeda and Liu m- Their simulations are based on overset grids, i.e., a body- 
fitted grid for the wings (65 x 65 x 11 each) and the body (61 x 61 x 9), as well 
as a background grid (161 x 141 x 127) have been used to solve the incompressible 
Navier-Stokes equations, approximated in the artificial compressibility formulation, 
using a finite volume discretization. 

The fruit fly considered is defined in figure |5.3| The wing length from pivot point 
to wing tip, R = 2.47 mm, the fluid density qj = 1.225 kg/m^ and the wingbeat 
frequency / = 218 Hz are used for normalization. The body, depicted in figure [5(3] 
(a,c), has an elliptical cross-section with center points following an arched centerline 
of radius r^c = 0.94644 centered at = —0.244769, = —0.9301256. The wing 

pivot points are located at ri = (~0-12, ±0.1445, 0.08). To facilitate reproduc¬ 
tion, the supplementary material contains STL files (StereoLithography) of the fruitfly 
geometry, as well the parameter file that can be used to reproduce the results with 
FluSI. The insect is tethered at xi^'’ = (1.6, 1.6, 1.9) in a computational domain of 
size 3.2 X 3.2 x 3.2, discretized with 640 x 640 x 640 Fourier modes and a penalization 
parameter of = 1.15 • 10“^ {K = 0.23). Its body orientation is given by "0 = 0, 
/3 = —45°, 7 = 45° and ry = —45°. The fruit fly hovers, thus the body position and 
orientation do not change in time. A total of four wing beats has been computed, 


15 





































-1 -0.5 0 0.5 


X (global) 





Figure 5.3. Hovering fruit fly, for comparison with ms- N : wingbeat kinematics in the side 
view, (b) time evolution of the wing angles, (c) Drawing of the insect’s body in the plane = 0. 
The body is rotationally symmetric with a circular center line, (d) Drawing of the wing shape, 
together with its pivot point. 


and the initial condition is fluid at rest, u{x,t = 0) = 0. We apply a vorticity sponge 
in the x and y-direction (32 grid points thick with Csp = 10“^) and impose no-slip 
boundary conditions in the z-direction, i.e., we impose floor and ceiling. The simula¬ 
tion required 68 GB of memory and 15 100 CPU hours on 1024 cores. A total of 48 
200 time steps was performed. 

The wing shape is illustrated in figure [53] (d). It has a mean chord = A/i? = 
0.33, which yields with the kinematic viscosity of air, = 1.5-10“^ m^/s the Reynolds 
number Re = UtipCm/v = 2^Rfcm/i^ = 136 where $ = 2.44 rad is the stroke ampli¬ 
tude of the positional angle. The time evolution of the wing kinematics is illustrated 
in figure 5.3 (b). The up- and downstroke are not symmetric, and the wing trajectory, 
figure [573| (a), shows the characteristic U-shape. 

The results for the lift force, normalized with the weight, and the aerodynamic 
power in W/kg body mass are presented in figure 5.4 As the mass of the model 


insect was not given in m, we assumed the fourth stroke of the reference data to 
balance the weight, yielding m = 1.02 mg. Small circles at mid-stroke indicate stroke- 
averaged values. For the stroke-averaged vertical force, we And respectively 10.03%, 
6.19%, 4.03% and 3.26% relative difference to the reference data for the four strokes 
computed, which can be explained by the impulsively started motion at the beginning 
of the first stroke. The time evolution, e.g., the occurrence of peaks, are very similar in 
both data sets. The agreement is even better for the aerodynamic power, with relative 
differences of 0.57%, 0.06%, —0.95% and —1.00% for the stroke-averaged values. Both 
power and lift peak during the translation phase of the up- and downstroke, and reach 
a minimum value at the reversals. 

The flow field generated by the fruit fly model is visualized in figure |5.5| by iso¬ 
surfaces of the Q-criterion, which can, for incompressible flows, be computed as Q = 
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Figure 5.4. Hovering fruit fly, comparison with flTV . Top: total vertical force. Bottom: 
aerodynamic power. Gray shaded areas denote upstrokes, stroke averages are indicated by circles. 
The mean values during the last stroke differ by 3.26% and 1.00% for the vertical force and the 
power, respectively. 



Figure 5.5. Snapshot of the vortical structures generated by a hovering fruit fly model, visual¬ 
ized by iso-surfaces of the Q-criterion (Q = lOOj shortly before the ventral stroke reversal tfT = 0.4. 
Typical feature of flapping flight, such as the leading edge and wingtip vortex, are visible. 


y'^pl2, see |181 p. 23]. The flow fleld exhibits the typical features, such as a leading 
edge vortex and a wingtip vortex. 

5.4. Butterfly model in free flight. To complete the validation section, we 
consider the freely flying butterfly model presented in |341 section 5.3]. The body 
consists of a thin rod with quadratic wings of length R = 1, and the wing kinematics 
is inspired by a butterfly. Figure jS.flj (top row) illustrates the prescribed wingbeat 
kinematics, converted to the convention used in this work. The Reynolds number 
is Re = 300 = UtipR/v, where C/tip = tt is the mean wingtip velocity. The mass 
of the butterfly is M = 38, and the moments of inertia of the body are given by 
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Jy^'^ = Jz°’ = 3.1667. The roll motion is inhibited (five degrees of freedom) and the 
mass of the wings is neglected. Gravitational acceleration is given by g = 0.1304. 
Present computations are performed on a domain of size 6x6x6 using a resolution 
of 1024 X 1024 X 1024 and the penalization constant is Cy = 1.4 • 10“"*^ {K = 0.2). We 
apply a vorticity sponge in all directions, centered around the moving center of the 
insect We computed 9 strokes (65066 time steps), the computation allocated 

200 GB of memory and consumed 79915 GPU hours on 4096 cores. The parameter 
files used in this compuation are in the supplementary material. 


r(6) 


Figure 5.6 (mid row) compares the coordinates of the center of mass and the body 
pitch angle with the reference solution, showing good agreement with the reference 
computation. A slight deviation in the vertical component can be observed, which 
can be explained by the fact that the slight differences in the forces (Fig. |5.6| bottom 
row) are integrated twice with respect to time. The general agreement is good, and 
we conclude that our code is validated. 

6. Application to a bumblebee model in forward flight. In order to 
demonstrate the applicability of the software environment to more complex insects at 
higher Reynolds number, we consider in this chapter a different insect model with a 
Reynolds number of Re = 2^Rfcm/v = 2042. The model is derived from a bumble¬ 
bee and its detailed morphology and kinematics can be found in the supplementary 
material (STL file of the geometry as well as the parameter files that can be used to 
reproduce the results with FluSi). The model has also been used in US suppl. mat.]. 
The body contains details such as the legs and antennae, since they can be easily 
included in the present framework. The bumblebee is considered in forward flight at 
Uoo = 1.246. The domain size is set to 6 x 4 x 4 and discretized with 1152 x 768 x 768 
grid points. The penalization parameter is set to Cy = 2.5 • 10“'* {K = 0.074). Four 
strokes have been simulated, requiring 26467 GPU hours on 1024 cores and 153.6 GB 
of memory. 

Figure 6.1 visualizes the flow field generated by the model by the |jw|| = 100 


isosurface of vorticity magnitude, for the times t/T — 0.3 and 0.95, where T is the 
period time. At t/T = 0.3, the leading edge vortex and the wingtip vortex are clearly 
visible, yet the flow field presents much smaller scales than in the case of a fruitfiy 


(Fig. 5.5). At t/T = 0.95, which is shortly before the beginning of a new cycle, the 
vortex “puff” shed at the stroke reversal {t/T « 0.5) are visible. The flow field is both 
spatially and temporally intermittent. Figure [O] gives the impression of a turbulent 
flow field, which can be quantified by the energy spectra of the velocity in a 2D slice 
perpendicular to the flow direction. Figure [6^ shows the chosen position of the slice at 
X = 2.67, and the radial energy spectra for 10 times throughout the cycle. At any time 
t/T, the energy spectrum U(fc) = 1/2 Efc-i/ 2 <|fe|<fe-si /2 {k)\^ + \uz (fc)| (fc)|^ 

where k = {kx, ky), is full and exhibits an intertial range with a slope, which is 

a typical feature of turbulence. 

As described in section [231 ^ vorticity sponge technique has been used to overcome 
the periodicity. Let us briefly discuss the influence of the sponge and the choice of 
parameters. Figure |6.3| visualizes the flow field from two simulations, one with a 
shorter domain (4x4x4) and the previously discussed one. In the shorter domain, 
a sponge is active in the region 3.5 < x < 4 with a constant of Cgp = 10“*. 

7. Conclusions and Perspectives. This article presents the open source soft¬ 
ware FluSI (https://github.com/pseudospectators/FLUSI) for the numerical simula¬ 
tion of the aerodynamics of flapping insect flight running on massive parallel com- 
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Figure 5.6. Freefly flying butterfly model for comparison with Top: kinematics and 

illustration of the wingbeat. Mid left: coordinates of the center of mass for the horizontal (solid 
lines) and vertical (dashed line) component. Mid right: body pitch angle (3. Bottom: vertical (left) 
and forward force (right) component. Symbols indicate stroke averages. Dash-dotted line marks 
weight. Overall agreement is good. 


puter architectures. Different benchmarks demonstrated the efficiency of the code 
and showed its validity in comparison with results from the literature. The numerical 
method is based on a Fourier pseudo-spectral discretization of the three-dimensional 
incompressible Navier-Stokes equations. Thus no artificial numerical diffusion or dis¬ 
persion are introduced in the discretization. The no-slip boundary conditions for the 
complexly shaped and time varying geometry of the flapping wings and the insect 
body are imposed with the volume penalization method. The penalization parameter 
is chosen such that the modeling error, due to penalization, and the discretization 
error are balanced. The computational cost of the flow solver is essentially due to the 
Fourier transforms. Benefiting from the efficient implementation of three-dimensional 
FFT, excellent scaling on large scale computing clusters has thus been obtained. A 
limitation of the current approach is that equidistant Cartesian grids are required 
and a compromize between the domain size and the number of grid points has to be 
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Figure 6.1. Flow field generated by the model bumblebee in forward flight, visualized by the 
1^11 = 100 isosurface of vorticity magnitude. 



Figure 6.2. Right: Radial energy spectra in a slice perpendicular to the flow direction (x = 
2.67/ The spectra are full and exhibit an inertial- and dissipative range. The resolution is 1152 x 
768 X 768, thus the highest resolved wavenumber is /Cmax = 256 • (27r/4), due to the de-aliasing using 
the 2/3 rule. Right: position of the slice and insect in the computational domain of size 6x4x4. 



Figure 6.3. Influence of the vorticity sponge in a bumblebee simulation. Shown is the ||^|| = 20 
isosurface of vorticity magnitude. A simulation with a longer = 6, blue) and a shorter domain 
( ij . = 4, orange) have been performed. In the shorter simulation, the sponge occupies the region 
3.5 < 3; < 4. The sponge constant is Csp = 10~^. 
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imposed. The modular structure of the FluSI code permits to design different and 
complex geometries of the insect shape and its wings easily and also to change their 
kinematics. For the free flight option, equations in a quaternion-based formulation 
are solved. Different flow configurations, like channel flows with laminar or turbulent 
inflow or including bluff bodies of almost arbitrary shape, are possible using penal¬ 
ization together with a sponge technique. 
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